Skip to content

Added support for compression on meta device #376

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jul 9, 2025

Conversation

shanjiaz
Copy link
Contributor

@shanjiaz shanjiaz commented Jul 2, 2025

Summary:
This PR added model compression/decompression support to handle models instantiated on the meta device. Updated downstream dependencies as well.Specifically,

Sparse24BitMaskCompressor:

  • Updated compress_weight signature to accept optional module argument for meta support.
  • When compressing meta tensors, creates empty meta placeholders for compressed weights rather than performing CPU operations.
  • Updated sparse compression helper functions to support meta device tensors gracefully.

Quantized Compressors:

  • Refined quantization compression to support saving compressed tensors on meta device.
  • Removed Numpy logic.

Test:
Tested with pytest tests/test_compressors in compressed_tensor and pytest tests/quantization/compressed_tensors_integration/ in transformer and all tests passed.

skipped tests in llm-compressor now pass. Will recover them after the transformer PR merges.

Compressing model: 293it [00:00, 3941.09it/s]test_run_compressed.py::Test_Decompressed_Linear_Uncompressed_Linear_0_commit::test_compressed_matches_decompressed 
Decompressing model: 293it [00:00, 337.60it/s]
`run_compressed` is only supported for compressed models. Setting `run_compressed=False`
2025-07-07T12:47:02.525662-0400 | reset | INFO - Compression lifecycle reset
PASSED2025-07-07T12:47:13.135942-0400 | reset | INFO - Compression lifecycle reset

Compressing model: 293it [00:00, 6702.91it/s]test_run_compressed.py::Test_Compressed_CompressedLinear_Decompressed_Linear_0_commit::test_compressed_linear_modules_exist 
Compressing model: 293it [00:00, 7316.63it/s]
Decompressing model: 293it [00:00, 411.03it/s]
2025-07-07T12:47:15.307218-0400 | reset | INFO - Compression lifecycle reset
PASSED2025-07-07T12:47:15.309934-0400 | reset | INFO - Compression lifecycle reset

tests/llmcompressor/transformers/compression/test_run_compressed.py::Test_Compressed_CompressedLinear_Decompressed_Linear_0_commit::test_compressed_matches_decompressed__hf_quantizer 2025-07-07T12:47:15.310584-0400 | reset | INFO - Compression lifecycle reset
PASSED2025-07-07T12:47:25.401371-0400 | reset | INFO - Compression lifecycle reset

Compressing model: 293it [00:00, 2849.25it/s]test_decompress.py::TestDecompression_0_commit::test_hf_quantizer_decompress_match_manual_decompress 
Decompressing model: 293it [00:00, 334.66it/s]
Decompressing model: 154it [00:00, 161.66it/s]
2025-07-07T12:47:48.791590-0400 | reset | INFO - Compression lifecycle reset
PASSED2025-07-07T12:48:09.605394-0400 | reset | INFO - Compression lifecycle reset

Nightly & e2e all pass as well : )

shanjiaz added 2 commits July 2, 2025 09:56
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
@shanjiaz shanjiaz marked this pull request as ready for review July 2, 2025 16:10
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
shanjiaz added 3 commits July 3, 2025 09:44
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Copy link
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure the test cases in folders ending in _skipped now pass: https://github.com/vllm-project/llm-compressor/tree/main/tests/llmcompressor/transformers/compression

Copy link
Contributor

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! One comment and another suggestion applied to 3 different lines

shanjiaz and others added 4 commits July 7, 2025 10:40
…4_bitmask.py

Co-authored-by: Brian Dellabetta <brian-dellabetta@users.noreply.github.com>
…4_bitmask.py

Co-authored-by: Brian Dellabetta <brian-dellabetta@users.noreply.github.com>
…4_bitmask.py

Co-authored-by: Brian Dellabetta <brian-dellabetta@users.noreply.github.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
@shanjiaz
Copy link
Contributor Author

shanjiaz commented Jul 7, 2025

Please make sure the test cases in folders ending in _skipped now pass: https://github.com/vllm-project/llm-compressor/tree/main/tests/llmcompressor/transformers/compression

They're passing now. Logs are pasted in the PR description. 🫡

Copy link
Contributor

@kylesayrs kylesayrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand the changes to pack_to_int32. The original function doesn't actually use any explicit numpy calls (outside of start and end), so I don't see why the original function wouldn't work with meta tensors?

Adding some tests for compressing meta models as well as using the compressors with is_meta=True would help with this

Also, are the changes from here required?

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Copy link
Contributor

@kylesayrs kylesayrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoke more with @shanjiaz and clarified some things. Tests are correct and passing, and the logic looks correct, nice job!

Copy link
Member

@rahul-tuli rahul-tuli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shanjiaz shanjiaz enabled auto-merge (squash) July 9, 2025 18:18
Copy link
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice job

@shanjiaz shanjiaz merged commit 8f67b97 into main Jul 9, 2025
1 check passed
@shanjiaz shanjiaz deleted the hz-fix-int8-decompression branch July 9, 2025 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants